Examples  of  Distributions. 


{a)  CONTINUOUS  VARIATION. 


Symmetrical 

Distribution. 

Slightly  Asymmetrical. 

Markedly 

Asymm  etrical. 

Age  Distribution.  J 

Stature  in  Man.* 

Weight  in 

Man.  f 

Cases  of  Scarlet  Fever. 

Rental  of  Houses.  § 

Sta’ture 

in 

Nos.  in 

each 

Weight  in 

Nos.  in 

No.  of 

Rent 

No.  of  Houses 

Inches. 

Class. 

Lbs. 

each  Class. 

Age. 

Cases. 

under 

in  Thousands. 

51-58 

2] 

90-100 

2 

0-1 

246 

£10 

3175 

58-59 

4 

22 

100-110 

34 

1-2 

773 

£10-20 

1451 

59-GO 

I4J 

110-120 

152 

2-3 

1399 

GO-61 

411 

120-130 

390 

3-4 

1874 

£20-30 

442 

61-62 

83 

- 294 

130-140 

867 

4-5 

2009 

62-63 

169^ 

140-150 

1623 

5-6 

1931 

£30-40 

260 

63-64 

394' 

150-160 

1559 

6-7 

1704 

£40-50 

151 

64-65 

669 

2053 

160-170 

1326 

7-8 

1533 

£50-60 

90 

65-66 

990J 

170-180 

787 

8-9 

1236 

66-67 

12231 

180-190 

476 

9-10 

1014 

67-68 

1329 

-3782 

190-200 

263 

10-15 

2921 

£60-80 

104 

68-69 

1230 

200-210 

107 

15-20 

921 

69-70 

10631 

210-220 

85 

20-25 

417 

£80-100 

47 

70-71 

646 

-2101 

220-230 

41 

25-35 

327 

71-72 

392 

230-240 

16 

35-45 

85 

Above 

72-73 

2021 

240-250 

11 

45- 

32 

£100 

110 

73-74 

79 

r 313 

250-260 

8 

74-75 

32 

260-270 

1 

75-76 

16 

270-280 

... 

76-77 

6 

- 23 

280-290 

1 

77-78 

2J 

Totals, 

8388 

7749 

* Report  B.A,  1883,  p.  256.  f Report  B.A.  1883.  } Manehester  Health  Reports. 

§ Goschen,  quoted  by  Pearson,  Phil.  Trans.  Roy.  Soc.  vol.  186  A,  pp.  343-414. 


(b)  DISCRETE  VARIATION. 


Number  ok  Sepals  in  Flowers  of 
Anemone  nemorosa* 

Number  of  Petals  in 
Ranunculus  Bulbosum.  f 

No.  of  Sepals. 

No.  of  Instances. 
Example  (a). 

No.  of  Instances. 
Example  (/^). 

No.  of 
Petals. 

No.  of 
Instances. 

4 

3 

5 

133 

5 

7 

31 

6 

55 

6 

515 

657 

7 

23 

7 

419 

271 

8 

7 

8 

49 

35 

9 

2 

9 

13 

2 

10 

2 

10 

1 

1 

U 

... 

11 

1 

. . . 

12 

... 

... 

Yule,  Biometrika,  vol.  i.  p.  308. 


t De  Vries,  quoted  by  Pearson,  loc.  cit. 


Constants  of  Distributions. 


(1)  Mean.  If  there  be  a number  of  quantities  of  definite  measurement,  then  the  ' 
term  mean  is  used  to  denote  the  sum  of  these  measurements  divided  by  the  total  number 
of  the  quantities,  or  if  ...  be  the  measurements,  n in  number,  and  M the  mean. 

The  term  mean  as  used  in  .statistics  is  equivalent  to  the  term  arithmetical  mean 
in  algebra. 


(2)  Median.  'Phe  median  is  the  central  value  of  the  group  when  the  measurements 
are  arranged  in  order  of  magnitude,  so  that  the  number  of  instances  above  the  median 
is  equal  to  that  below  the  median.  If  the  groups  are  at  all  numerous,  it  is  most  easily 
calculated  by  simple  proportion.  Thus,  taking  the  weights  of  British  adults,  we  find 
7749  instances.  Of  these  3068  are  under  1.50  pounds,  a defect  of  806'5  below  the  median, 
and  3122  above  160  pounds,  so  that  the  median  will  be  very  approximately  given  by 

Median  = 1504-^^x10  =155-2  lbs. 

15o9 

The  first  number  is  the  weight  at  which  the  group  begins : the  multiplier  10  is  the 
value  of  the  group  difference,  and  the  fraction  the  proportional  number  of  the  group 
1559  to  be  expected. 

(3)  Mode.  This  is  the  mo.st  frequent  group  in  asymmetrical  distributions,  and  in 
symmetrical  distributions  coincides  with  the  mean  and  the  median.  The  group  in  which 
the  mode  is  situated  can  usually  be  easily  seen,  and  if  the  middle  point  of  this  be  denoted 
by  zero,  the  distance  of  the  mode  from  this  can  be  calculated  approximately  by  the  formula 


m j m.  — m, 

Mode  = jT7 L ^ r, 

2 (m^  — 2m2  4- 

where  n\,  m.^,  m^  are  the  numbers  included  in  the  successive  groups,  m^  being  that  of 
the  group  in  which  the  mode  is  expected. 

More  accurately,  the  median  in  general  lies  between  the  mode  and  the  mean,  so  that 
its  distance  from  the  former  is  twice  that  from  the  latter,  that  is 

2(Mean—  Median)  = Median  — Mode, 

or  Mode  = 3 x M edian  — 2 x Mean 

= 150-8  lbs.  in  the  case  previously  considered. 

— 792 

By  the  formula  given.  Mode  = 145  4 ^ ™ x 10  = 149-2  lbs., 

Z X 

or  one  per  cent,  of  difference. 


(4)  Standard  deviation.  This  expresses  the  degree  of  scatter  ia  the  distribution  : 
thus,  take  two  distributions  (a)  and  (6),  as  shown  in  the  table  below,  of  equal  numbers, 
having  the  same  mean.  Let  the  measurements  be  1,  2,  3,  4,  5. 


Size. 

Number. 

Deviations. 

No.  X sqr.  of 
Deviations. 

a 

b 

(a) 

(6) 

1 

• • • 

1 

-2 

• • • 

4 

2 

4 

4 

-1 

4 

4 

3 

8 

6 

0 

0 

0 

4 

4 

4 

1 

4 

4 

5 

1 

2 

4 

Totals, 

16 

16  i 

8 

16 

Both  these  groups  have  the  same  mean  measuring  3,  but  the  variation  in  the  latter 
is  much  greater  than  in  the  former.  This  difference  is  measured  by  the  standard  deviation, 
which  is  defined  as  the  square  root  of  the  mean  of  the  squares  of  all  the  deviations,  the 
latter  being  measured  from  the  mean  value  of  the  quantity.  In  the  above  example  the 
mean  is  at  3,  therefore  the  deviations  are  as  given  in  the  fourth  column.  In  the  fifth 
and  sixth  columns  the  square  of  the  deviations  are  multiplied  by  the  number  of  each 
group  occurring.  The  sums  are  8 and  16,  so  that  the  respective  standard  deviations  are 

Vil-  f '■ 

The  standard  deviation  is  usually  denoted  by  a-. 


(5)  Skewness.  This  measures  the  degree  of  asymmetry  of  a distribution,  and  is 
defined  as  the  ratio  of  the  distance  between  the  mode  and  the  mean  divided  by  the 
standard  deviation, 


i.e. 


Sk.= 


Mode  — Mean 
Standard  Deviation 


- in  the  usual  notation. 
cr 


(6)  Coefficient  of  Variation.  This  coefficient  is  defined  as  100  times  the  ratio  of  the 
standard  deviation  to  the  mean  and  is  denoted  by  v,  so  that 

.=  100^. 

This  coefficient  allows  comparisons  to  be  made ; for  a variation  of  2 inches  above  or 
below  the  mean  is  very  much  greater  when  the  mean  is  equal  to  10  inches  than  when 
it  is  equal  to  40  inches. 


Method  of  Calculating  the  Mean  and  Standard  Deviation 
of  a Series  of  Observations. 


Example.  Theoretical  number  of  aces  where  three  cards,  one  of  wliich  is  an 
ace,  are  dealt  in  groups  of  seven. 


No.  of 
Aoe.s. 

No.  of 
instances  x. 

Deviation 
from  chosen 
zero  /. 

X xf 

X x f^ 

X xP 

X X P 

0 

128 

_2 

-256 

512 

-1024 

2048;^ 

1 

448 

-1 

-448 

448 

— 448 

448 

2 

672 

0 

-704 

960 

-1472 

2496 

3 

560 

1 

560 

560 

560 

560 

4 

280 

2 

560 

1120 

2240 

4480 

5 

84 

3 

252 

756 

2268 

6804 

6 

14 

4 

56 

224 

896 

3584 

7 

1 

5 

5 

25 

125 

625 

Total, 

- 2187 

1433 

-704 

2685 
+ 960 

6089 

-1472 

160.53 
+ 2496 

729 

3645 

4617 

18549 

First  choose  by  inspection  an  origin  at  a point  as  near  as  possible  to  the  mean. 

denote  the  first,  second,  third,  and  fourth  moments  round  tliis 


Then 

chosen  origin,  so  that 


'Lx  xf 

'' 


Lx  X P 
''  N 


etc.. 


where  N denotes  the  total  number  of  instances. 
In  this  case, 


= 


729 

2187' 


= o.  V.,  = 


3645 

2187 


^ = v.= 


4617 

2187' 


19 
' 9 ’ 


18549 _ ^ 
2187  ~W 


Here  Mean  is  at  a distance  from  the  chosen  origin,  or  at 

2 + ‘3  or  2'3  units  from  the  real  zero. 

The  moments  about  the  mean  are  the  most  important,  and  are  denoted  by 
/x-2,  ^3,  being  equal  to  zero  by  definition.  These  are  obtained  from  the 

preceding  moments  by  the  formulae : 

— — (very  important) 

M4  = — 3j/j't 

_5_/lV^_U 

“3  W ~19’ 


In  this  case. 


19 
229 


1 5 

3 '3' 


+2(dy=i4 
^ \3/  27’ 


1 19 


27  9 

'I’he  standard  deviation  cr  is  equal  to  the  sj /ul.^. 


3 


27  ■ 


u 


:::.T 


'()  -o  <;  io 


Jf 
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I.  Correlation. 


Examplk.  'Jypical  Correlation  Table  showing  the  relationship  of  tlie  stature  of 
Fathers  and  Sons  in  inches.*  , 


STATURE  OF  FATHER. 

58-5-61 -5 

61-5-64-5 

64-5-67-5 

67-5-70-5 

70-5-73-5 

73-5-76-5 

Totals 

-6 

-3 

0 

3 

6 

9 

— 

1*5 

2 

— 

— 

3*5 

58 -5-6 1-5 

•05 

■36 

1*20  1 

1*32 

*50 

*03 

-4 

-2 

O 

2 

4 

6 

3-5 

19 

33 

5*5 

1*5 

62*5 

61  •5-64-5 

■84 

6*50 

21*75 

23*87 

9*02 

*55 

-2 

-1 

O 

1 

2 

3 

8*5 

53*75 

148 

80*5 

8*25 

— 

299 

o 

64'5-67'5 

4*02 

31*07 

104*03 

114*15 

43*14 

2*64 

o 

O 

O 

O 

O 

0 

O 

H 

2*5 

33*25 

149*25 

202*25 

60*25 

3*5 

451 

pd 

67-5-70-5 

6-07 

46*86 

156*90 

172*17 

65*06 

3*97 

H 

2 

1 

0 

1 

2 

3 

a: 

— 

3*5 

39*75 

104*25 

62*0 

3*5 

213 

70-5-73-5 

2-87 

22*13 

74*10 

81*31 

30*73 

1*88 

4 

2 

O 

2 

4 

6 

— 

1 

3 

14*5 

20*5 

2*5 

41*5 

1 

73-5-76-5 

•56 

4*31 

14*44 

15*84 

5*99 

*37 

6 

3 

O 

3 

6 

9 

— 

— 

— 

4*5 

3*0 

— 

7*5 

7 6 -5- 79 -5 

•10 

*77 

2*59 

2*84 

1*07 

*07 

'totals 

14*5 

112 

375 

411*5 

155*5 

9*5 

1078 

Note.  The  numbers  above  the  figures  in  the  middle  of  each  square  are  the  ijroducts  of  the 

deviations  from  the  chosen  origin. 


Coefficient  of  Correlation  is  defined  by  r = — where  'Lxy  is  equal  to  the  sum 

o-jcr.2 

of  all  the  observations  multiplied  by  the  product  of  their  deviations  from  the  mean 
in  the  vertical  and  horizontal  directions.  It  is  termed  the  'prodiLct  moment. 

If  be  the  distances  of  the  mean  from  the  chosen  origin,  and  'Lx'y'  the 

product  moment  round  that  origin,  'Lxy  — 'Lx'y'  — 


♦Pearson,  Biometrika,  vol.  ii.  p.  415. 


II.  Contingency. 


In  the  above  table,  beneath  each  figure  is  printed  in  smaller  characters  a figure 
which  shows  the  number  of  cases  to  be  expected  if  there  were  no  relationship 
between  the  stature  of  fathers  and  sons.  These  are  obtained  by  taking  the  total  of 
each  horizontal  series  and  dividing  it  in  the  same  proportions  as  are  given  in 
the  horizontal  series  showing  the  totals  in  each  column. 

To  obtain  the  coefficient  of  contingency,  take  the  difference  of  each  theoretical 
number  from  the  corresponding  actual  number,  square  this  difference,  and  divide 
by  the  theoretical  number. 

Thus,  in  the  first  row  and  tl\ird  column,  the  theoretical  number  is  T20  and 
the  actual  is  2,  whence  we  have  ^ — '53. 

All  the  numbers  found  in  this  way  are  summed.  The  total  is  denoted  by 
This  total  divided  by  N,  the  total  number  of  observations,  is  further  denoted  by  0*, 
whence  we  have  the  coefficient  of  contingency 

r= 

Vl  + 0-^’ 


III.  Fourfold  Division  Method. 

Kxampi.e.  Smallpox  and  Vaccination.  Sheffield,  1887-88.* 


Recoveries. 

Deaths. 

Totals. 

Vaccinated 

3951 

200 

4151 

Un  vaccinated 

278 

274 

552 

Totals 

4229 

474 

4703 

If  the  same  method  as  is  shown  in  paragraph  1 above  is  used,  and  if  the 
fourfold  division  be 


a 

b 

c 

d 

ad  — be 

{{a  + b){b  + G){c+d){d+a))^ 

No  method,  however,  applied  to  this  fourfold  division  is  satisfactory. 
Pearson’s  fourfold  division  method  gives 

r = -77. 


Macdonell,  Biomet)'ika,  vol.  i.  p.  376. 


Examples  of  Correlation 


II.  AGES  AT  MARRIAGE  OF  BACHELORS  AND  SPINSTERS, 
ENGLAND  AND  WALES,  1901. 


AGES 

OF  SPINSTERS. 

15-20 

20-25 

25-30 

30-35 

35-40 

4045 

45-50 

50-55 

55-60 

60-65 

65-70 

70-75 

75  80 

Totals 

Mean 

15-20 

2,606 

1,356 

75 

4 

2 

— 

— 

-- 

— 

— 

— 

— 

— 

4,043 

19-39 

20-25 

14,821 

73,430 

12,989 

1,110 

123 

20 

4 

— 

— 

— 

— 

— 

— 

102,497 

22-54 

25-30 

2,785 

37,317 

33,2-29 

5,249 

648 

76 

9 

2 

— 

— 

— 

— 

79,315 

25-23 

30-35 

482 

6,657 

10,184 

5,908 

1,217 

182 

24 

4 

— 

— 

1 

— 

— 

24,659 

27-78 

35-40 

103 

1,317 

2,545 

2,246 

1,432 

326 

57 

3 

2 

1 

— 

— 

— 

8,032 

30-51 

40  45 

15 

322 

594 

726 

631 

427 

96 

17 

2 

-- 

— 

— 

- 

2,830 

33-50 

45-50 

3 

67 

158 

221 

250 

206 

112 

19 

6 

1 

— 

— 

- 

1,043 

36-28 

50-55 

3 

3 

35 

73 

86 

87 

67 

39 

11 

2 

— 

— 

— 

406 

40-31 

55  60 

— 

9 

22 

32 

24 

30 

17 

14 

5 

— 

— 

— 

153 

43-25 

60-65 

— 

1 

6 

8 

6 

12 

11 

5 

9 

9 

— 

— 

— 

67 

45-50 

65-70 

— 

— 

0 

2 

3 

1 

5 

3 

4 

2 

1 

— 

- - 

23 

47-50 

70-75 

— 

— 

— 

— 

1 

1 

— 

— 

1 

1 

1 

2 

1 

8 

61-35 

75-80 

— 

— 

1 

— 

— 

1 

— 

1 

— 

— 

— 

1 

1 

5 

54-50 

Totals 

20,818 

120,470 

59,827 

15,569 

4,431 

1,363 

415 

110 

49 

21 

3 

3 

2 

223,081 

24-56 

Mean 

22-87 

24-78 

27-91 

32-36 

35-88 

40-74 

45-59 

50-36 

55-76 

59-4 

57-1 

67-3 

75-0 

26-38 

II.  CORRELATION  OF  NUMBER  OF  MULLERIAN  GLANDS  ON  THE  RIGHT 
AND  LEFT  LEGS  OF  2000  SWINE  (DAVENPORT). 


NUMBER  OF  GLANDS 

ON  RIGHT  LEG. 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Totals 

d 

0 

8 

4 

2 

— 

— 

— 

— 

— 

— 

— 

— 

14 

h) 

1 

5 

151 

65 

14 

5 

1 

_ 





241 

H 

2 

2 

58 

154 

88 

27 

7 

— 

— 

— 

— 

336 

>4 

3 

— 

9 

96 

173 

119 

24 

8 

1 

— 

— 

— 

430 

b 

4 

3 

28 

128 

153 

92 

16 

8 

1 





429 

m 

Q 

5 

— 

— 

7 

28 

77 

101 

58 

20 

3 

1 

— 

295 

<! 

6 

— 

— 

1 

6 

26 

52 

48 

18 

5 

3 

— 

159 

o 

7 

— 

— 

— 

— 

3 

11 

16 

17 

3 

3 

— 

53 

o 

8 

— 

— 

— 

■ — 

1 

9 

7 

9 

2 

2 

— 

30 

m 

9 

— 

— 

— 

— 

— 

— 

— 

5 

2 

2 

1 

10 

1 

10 

— 

— 

— 

— 

— 

— 

2 

— 

— 

1 

— 

3 

Totals 

15 

225 

353 

437 

411 

297 

155 

78 

16 

12 

1 

2000 

Mean 

-60 

1-36 

2-31 

3-20 

3-89 

4-78 

5-51 

6-14 

6-50 

7-33 

9-00 

Mean  No.  of  Glands  ; right  leg  = 3’55  j left  leg  = 3‘54.  Standard  Deviation  : right  leg=  T72  ; left  leg=  1'73. 

Correlation;  ?•  = 0-792. 


Partial  Correlation  Coefficients 


If  there  be  three  variables  which  are  all  correlated  with  one  another,  and  if 
'^2.3)  '^1.3  denote  the  correlations  of  each  pair,  the  partial  correlation  coefficients 
are  given  by 


7N  9. q — ' 


’’1.2  .3  X '^2.3 


=,  etc. ; 


whence  denotes  the  correlation  between  the  first  and  second  variables 

when  the  third  is  constant. 


ExAMPr.E.  Correlation  between  the  amount  of  Summer  Diarrhoea,  the  Mean 
Temperature  in  July  and  the  Rainfall  in  the  same  month. 


TABLE  I. — Death  rate  from  Diarrhoea  per  million  per  year  in  Loudon, 
and  the  Mean  Temperature  of  July  (Greenwich). 


TABLE  II. — Correlation  between  Temperature  and  Rainfall. 


1 

RAINFALL  IN  INCHES. 

1 

0-1 

1-2 

2-3 

3-4 

4-5 

5-6 

6-7 

Totals. 

H 

57  -59“ 

— 

__ 

— 

1 

_ 

_ 

1 

2 

H 

59"-61“ 

1 

2 

2 

3 

1 

- 

- 

9 

H 

S 

W 

El 

Gr-()3“ 

63“-65° 

65“-67° 

3 

2 

] 

2 

3 

2 

1 

2 

4 

1 

- 

1 

1 

1 

8 

8 

8 

Totals,  - 

6 

8 

7 

9 

1 

2 

2 

35 

^'t.r — — *395, 


TABLE  III. — Correlation  Diarrhoea  and  Rainfall. 


7'j)  jj—  — *295. 


Paktial  Correlations. 

-■295-(-650)x(-*395) 

® ^ T - /i_.65(j2  _ -3952 

= -*070. 

■650-(-*295)(-*395) 

''  d.t:r-  _ -oso-’  Jl  - -395" 

= 643. 


Probable  Error. 

Example  of  the  variation  in  groups  of  small  numbers.  Number  of  deaths  '^n 
parallel  series  of  fifties  among  the  admissions  of  patients  suffering  from 
scarlet  fever  to  Belvidere  Hospital,  1900-1908. 


1,  - 

2,  1 

3,  2 

2,  1 

1,  4 

2,  1 

9 - 

5,  4 

3,  3 

1,  2 

3 

3,  1 

1,  3 

1,  3 

b 4 

3,  2 

3,  2 

4,  ■'> 

1,  3 

3,  2 

3,  3 

3,  - 

2,  3 

7,  4 

3,  - 

1,  4 

1,  3 

2,  6 

1 

2,  1 

3,  - 

4,  1 

2,  1 

2 2 

0 0 

2,  1 

3,  1 

3,  2 

3,  2 

1 

1 

1,  2 

1,  - 

2,  7 

3,  4 

2,  4 

1,  1 

Table  showing  the  distributions  of  these  figures  when  classified. 


No.  of  Deaths. 

No.  of  Times 
observed. 

Theoretical 

Number. 

Difference. 

Difference  squared 
divided  by  the 
Theoretical  Number. 

0 

10 

10-4 

•4 

•00 

1 

2,5 

23-4 

1-6 

•11 

2 

23 

25-8 

2-8 

•30 

3 

22 

18-7 

3-3 

•59 

4 

9 

9-8 

•8 

•07 

5 

4-0 

2-0 

1-00 

6 

7 

1-31,.,- 

1-3 

100 

Total 

94 

93-8 

— 

3 07  = x^ 

2-1 G deaths,  o-=l-437,  -^  = -148. 

dN 


If  a quantity  be  measured,  then  the  probable  error  of  that  quantity  is  defined 
by  the  limits  within  which  it  is  equally  likely  that  the  quantity  would  in  a long 
series  of  measurements  be  found  to  lie  inside  and  outside. 

If  0-  be  the  standard  deviation  of  the  quantity,  th«  limits  above  defined  are 
given  by  ±'674cr. 

It  is  better  to  take,  however,  cr  itself  and  call  it  the  “ standard  error.” 

If  ±2o-  are  taken  as  the  limits  of  error,  then  the  odds  are  21  to  1 that  the  real 
value  of  the  quantity  lies  between  these  limits. 

Standard  Errors. 


(1)  Of  the  mean  h of  a number  N of  quantities 


JN' 


(2) 


Of  a correlative  coefficient  r when  N observations  has  been  made  = 


l-r2 
s/iV  ■ 


Test  of  Goodness  of  Fit  between  Theory  and  Observation. 

Take  each  group,  subtract  the  actual  from  the  theoretical  value,  square,,  divide  by 
the  theoretical  value,  and  sum.  The  sum  is  denoted  by 

Find  P or  the  probability  that  in  a certain  number  of  trials  more  difference 
between  theory  and  observation  would  be  found.  A short  table  of  this  function 
giving  the  values  of  and  the  number  of  groups  compared,  N is  printed  below. 

Example. 


Table  showing  Days  of  Sickening  in  907  Cases  of  Scarlet  Fever. 


No.  of  Cases. 

Theoretical 

Value. 

Difiference. 

(Difiference)^. 

(Difference)^. 

Theoretical 

Value. 

Sunday  - 

124 

129-6 

-5-6 

31-36 

-24 

Monday 

143 

129-6 

13-4 

179-56 

1-38 

Tuesday 

117 

129-6 

- 12-6 

158-76 

1-22 

Wednesday 

134 

126-6 

4-4 

19-36 

15 

Thursday 

120 

129-6 

-9-6 

92-16 

-71 

Friday  - 

143 

129-6 

13-4 

179-56 

1-38 

Saturday 

126 

129-6 

-3-6 

12-96 

-10 

Total 

, 907 

907 

673-72 

5-18 

Thus  p = '522  or  in  half  the  trials  made  as  much  divergence  would 

be  found. 


Table  sJcowing  the  Values  of  for  certain  Values  of  P a,nd  N. 


N 

Values  of  P. 

- 

-9 

-8 

-7 

-6 

-5 

-4 

-3 

-2 

-1 

3 





*7 

1-0 

1-2 

1-8 

2-4 

3-4 

4-6 

4 

— 

1-00 

1-4 

1-9 

2-4 

2-9 

3-6 

4-6 

6-1 

5 

1-0 

1-7 

2-2 

2-8 

3-4 

4-1 

4-9 

5-9 

7-7 

6 

1-6 

2-3 

3-0 

3-6 

4-3 

5-1 

6-0 

7-3 

9-2 

7 

2-2 

3-0 

3-8 

4-6 

5-3 

6-2 

7-2 

8-6 

10-5 

8 

2-9 

3-8 

4-7 

5-4 

6-3 

7-3 

8-4 

9-7 

12-0 

9 

3-5 

4-6 

5-5 

6-4 

7-4 

8-4 

9-5 

11-0 

13-2 

10 

4-2 

5-4 

6-4 

7-4 

8-4 

9-4 

10-6 

12-2 

14-6 

12 

5-6 

7-0 

8-2 

9-3 

10-4 

11-5 

12-8 

14-5 

17-2 

14 

7-0 

8-7 

9-9 

11-1 

12-3 

13-6 

15-1 

16-9- 

19-6 

16 

8-6 

10-3 

11-8 

13-0 

14-4 

15-7 

16-3 

19-3 

22-3 

THE  THEOKY  01  CHANCE  DISTRIBUTION 
AS  APPLIED  TO  BIOLOGY. 


By 


John  Brownlee,  M.D 


THE  THEORY  OF  CHANGE  DISTRIBUTION 


AS  APPLIED  TO  BIOLOGY. 


In  recent  years  a large  amount  of  work  has  "been  done 
regarding  the  forms  of  distribution  which  occur  in  biological 
measurements,  but  most  of  this  work  has  been  inductive  and 
most  of  the  methods  proposed,  though  permitting  elegant  math- 
ematical developments,  throve  little  light  on  the  causation 
of  different  types  of  distribution.  The  points  which  are 
specially  important  from  the  biological  side  have  been  much 
neglected.  Few  attempts  have  been  made  to  enquire  into  the 
reasons  which  determine  the  distributions  observed,  to  es- 
timate how  far  they  represent  some  real  vital  factor  or  to 
ascertain  the  extent  to  which  they  are  **artgfacto” . that  is, 
the  result  of  the  application  of  some  particular  standai^d  of 
measurement. 

I propose,  then,  in  this  paper  to  discuss  frequency 
distributions  with  a view  to  ascertain  how  types  of  distrib- 
ution arise  and,  when  such  types  have  arisen,  to  see  how  far 
tiiey  conform  to  the  results  of  biological  observation  and  to 
examine  how  far  the  reverse  problem  of  reasoning  from  a 
curve  to  a biological  process  can  be  justified. 

Chance  distributions  have,  for  a couple  of  centuries, 
been  the  subject  of  much  discussion.  The  theory  was  first 
put  on  a scientific  basis  by  Laplace  and  Gauss.  Both  of 
these  reached  by  somevirhat  different  processes  the  curve  now 
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known  as  the  normal  curve  of  error,  a curve  which  has  been 
found  to  give  an  adequate  description  to  many  of  the  measure-  | 

mentiaade  in  the  organic  and  inorganic  world.  Further  | 

j 

developments  to  allow  for  asymmetry  have  been  made  by 
McAllister,  Edgeworth  and  Pearson  in  this  country,  and  by 
Kaptflrn,  Pechner  and  Thiele  abroad.  These  in  different  ways  ! 
arrive  at  forms  of  curves  which  in  many  cases  very  closely 

i 

I 

“Vw 

describe  groups  of  statistics  which  have  asymetrical  dis- 
tributions.  In  spite  of  this  fact,  I feel  that  the  initial 

I 

reasoning  shows  far  too  little  consideration  to  the  require- 
ments of  biology.  Though  the  mat^matics  is  fascinating,  it 
cannot  bui  be  felt,  that  no  clear  idea  can  be  formed  as  to  th 

i 

the  meaning  the  initial  hypotheses  have  in  the  world  of  life. 

In  seeking  a foundation  hypothesis  I think  that  one  can  d, 
do  no  better  than  choose  the  simple  binomial  (p-fq)^,  the 
form  selected  by  Laplace  as  the  starting  point  of  his  invest- 
igation. Its  meaning  is  always  clear.  If  p « q this  form 
is  indistinguishable  from  the  normal  curve  of  error  even 
though  n be  comparatively  small.  If  £ be  greater  than  £ the 
normal  curve  also  results  is  n be  large,  but  if  n be  small 
the  resulting  curve  taked  the  form  known  as  Peqr^son*s  Typelllj 
P ■ q positive  and  negative  errors  are  equally  probable. 

In  all  cases  variation  is  independent  of  the  other,  although 
a small  amount  of  correlatiogi  as  shown  by  Edgeworth  does  ! 

i 

f 

I 
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not  effect  the  ultimate  form  of  the  curve  to  any  great 
extent.  In  biology,  if  P s <1,  such  a condition  as  is  seen 
in  the  inheritance  of  stature,  where  the  mean  stature  of 

I 

the  offspring  is  determined  by  the  means  of  the  elements  » 

derived  from  the  parents  and  the  distribution  of  both  is 

i 

equally  normal,  is  represented*  On  the  other  hand,  if  p 5 q, 
sucha  condition  as  dominance  is  described.  As  an  example  of  j 

j 

this  in  the  same  range  it  may  be  stated  that  if  certain  tall  || 

H 

I I 

varieties  of  pea  be  mated  with  certain  dwarf  varieties  all 

} 

the  offspring  in  the  first  generaltion  are  tall  and  in  the  j 
stable  population  there  are  ultimately  three  tall  plants  to 
one  dwarf.  This  is  represented  by  the  form  (p  q)*  where  1 
2,  a and  q a -4,  If  tallness  depends  upon  a number  of  factors  |i 
the  variation  of  the  mixed  race  would  in  the  simplest  case 
correspond  to  where  n denotes  the  number  of  pairs 

of  qualities  on  which  stature  depends,  assuming  that  all  the 

I 

pair^s  determining  tallness  and  shortness  come  originally  from  . 
the  same  plants.  Allowing  for  such  a condition  as  partial 
dominance  where  the  offspring  takes  more  markedly  after  one 
parent  than  after  the  other  and  also  for  coupling  where 
definite  pairs  of  elements  seem  to  have  some  special  affinity,  ' 

i' 

it  can  be  shown  that  all  the  different  distributions  which  \ 

1! 

have  been  biologically  found  are  adequately  expressed  as  direci 
derivatives  of  one  or  other  of  the  two  expressions  given  above. 


- 4 - 


Before  proceeding  further  the  "basis  of  the  theories  of 
asymmetrical  distribution  as  developed  by  Profs,  Edgeworth 
and  Pearson,  which  may  be  taken  as  representative  of  all,  will 
be  briefly  outlined.  The  former  calls  his  first  method,  ”the 
method  of  translation”  of  the  normal  curve,  (l)  In  this  the 
normal  curve  is  taken  as  the  general  law  and  the  frequency  of 
some  quality  assumed  to  vary  in  this  manner.  The  frequency 
corresponding  to  a particular  element  of  abscissa  in  this 
curve,  is:- 

t t 

If  now  % - /(t;  so  that  ^ varies  as  the 

corresponding  frequency  of  ^ is  equal  to 

(m'- 


is  the  equation  of  the  new  distribution.  This  is  evidently 
quite  sound  reasoning,  provided  a justification  for  the 
application  of  the  process  can  be  founds  The  formula,  however, 
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will  cover  almost  any  distritution.  In  general  for  practical 


Prof.  Edgeworth  arrives  at  a more  genera.l  formula  derived 
from  the  interaction  of  many  factors.  (2)  (3)  “The  rationale 
of  this  method  consists  not  merely,  and  not  principally,  in 
its  exactly  representing  those  cases  in  which  each  member  of 
the  frequency  group  under  consideration  is  a definite  functior 
of  some  member  of  a group  distributed  according  to  the  normal 
la.w  of  error.  For  instance,  if  the  velocities  of  winds,  or 
of  any  other  objects,  are  distributed  norma,lly,  then  the 
corresponding  energies  will  be  distributed  according  to  a 

i 

frequency  curve  which  is  an  exact  translation  of  a normal 
curve.  So  if  the  diameters  of  oranges  or  other  spherical  | 

bodies  vary  normally,  the  solid  contents  are  proportionate  to  i 
the  cubes  of  the  diameters”.  He  says  further,  "The  cases  in 

I 

which  translation  in  the  formula  are  doubtless  not  uncommon  | 

(compare  "Journal  of  teh  Statistical  Society",  Vol.  LXI,  1898, 
p.  678)  but  the  reason  of  the  method  lies  deeper*  It  consists  \ 
in  the  affinity  of  the  formula  to  that  universal  law  which  is 


ever  and  everywhere  approximately  fulfilled  through  the  whole 


purposes 


is  assumed  to  converge  rapidly.  Later 


realm  of  Statistics  - Statistics  propee  as  distinguished  from 
arithmetic  by  sporadic  or  fortuitous  dispersion"* 


Prof.  Pearson  on  the  other  hand  takes  as  his  basis  the 
differential  equation;-  ^ 


X 
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This  he  chtains  from  the  differential  equation  of  the 

/ X. 

normal  curve  ^ substituting  a function 

of  X for  the  constant  in  the  denominator.  This  function  is 
considered  sufficiently  given  by  the  first  three  terms  of 
its  expansion  by  Maclaurin's  theorem.  He  criticises 
Prof.  Edgeworth’s  method  of  translation  by  asking  what  is  the 
character  that  obeys  the  normal  law^  and  atates  that  this 
had  no  real  existence  as  a biological  entity.  To  this 
Prof.  Edgeworth  makes  the  following  reply,  "This  objection 
might  be  applice.ble  if  the  proposed  form  was  advocated  as  a pa 
particular  real  curve  related  to  some  real  attribute  dis- 
tributed  normally,  say,  as  enei^  is  to  velocity.  But  the 
objection  is  not  equally  applicable  to  the  position  now  taken. 
The  best  defence  of  this  position  is  that  it  is  the  same  as 
Prof.  Karl  Pearson’s.  For  his  Types,  as  here  interpreted, 
are  but  particular  representative,.curves  formed  by  a judicious 
divergence  from  the  normal  law  of  error  ( a divergence  well 
indicated  by  himself)". 

My  position  in  this  paper  is  quite  distinct  from  either 
of  these.  It  is  much  more  close  to  Prof.  Edgeworth’s  first 
method  of  translation.  I do  not  object  at  all  to  the  use  of 
a graduation  formula  for  special  purposes.  But  graduation 
formulae  such  as  those  of  Prof.  Pearson  and  Edgeworth  tell 
us  nothing  about  the  biologica.1  processes  v/hich  determine  the 
variations  and  it  is  these  specially  which  we  wish  to 
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inve;litigate.  Further,  as  will  he  seen,  neither  of  these 
methods  completely  accounts  for  certain  curves  v/hich  arise 
directly  in  biological  meaBurements. 

'fv  'Zf  ^ 2^^ 

The  modes  in  v/hich  the  normal  curve  / - 
arises  are  of  special  interest.  A full  treatment  is  im- 
possible here  and  from  the  point  of  biology  has  yet  to  be 
written.  The  curve  is  usually  deduced  by  taking  the  limit 
of  when  n is  great,  but  as  shown  by  Prof.  Pearson, 

(5),  even  when  n is  comparatively  small  the  approximation  is 
very  close.  The  same  proof  holds  when  (p  -f-  q)^  is  taken 
as  the  basis  of  the  development,  if  £ is  nearly  equal  to  £ 
and  if  n is  large.  The  reasoning  on  which  the  theorem  is 
based  can  be  easily  extended  to  three  dimension^  space  whan 
a “normal”  surfacr.  results. 

The  normal  curve  or  surface,  however,  arises  in  majsy 
different  v/ays.  For  instance,  the  solution  of  the  “random 
v/alk“  problem  by  Prof.  Pearson  is  an  example  of  hov/  even  a 
considerable  assumption  leads  to  the  normal  surface. 

Cl 

If  again,  two  races  differing  is  a spefic  quality  mix, 

A 

so  long  as  the  mean  of  the  two  elements  determining  a quality 

in  the  parents  represents  the  average  value  of  that  quality 

• 

in  the  offspring,  then  the  curve  of  distribution  of  the 
hybrid  will  be  much  nearer  the  normal  curve  than  that  of 
either  race,  and  if  quality  depend  on  several  elements  may 
be  almost  indistinguishable  from  it.  This  is  easily  seen 
from  a proof  given  in  a former  paper  ((7)  where  it  v/as  shown 


- 8 - 


thit  if  the  moments  of  the  tv^o  frequencies  round  their  res- 
pective centres  of  gravity  were  known,  those  of  a frequency 
which  was  the  distributed  product  of  these  round  its  own 
centre  of  gravity  could  be  at  once  written  down.  As  the 
problem  here  considered  from  the  assumption  made  is  equivalent 
to  the  distributive  multiplication  of  two  curves,  that  proof 
applies.  Denoting  the  moments  respectively  of  the  original 
distribution  by  ^ 7 5^  and  those  of 

the  product  by  ^ we  have 

h - , 


This  enables  us  at  once  to  see  how  the  normal  curve  arises; 
letting  for  simplicity  etc  • 

h » 


so  that  if  f /!i  Pearson*  s constants  for  the  first 

distributions  and  for  the  derived  distribution 

respectively 


^4' 


k 


3 ^ 1 


or  the  curve  of  distribution  of  the  hybrid  is  much  nearer  the 
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normal  than  that  of  the  parent,  since  for  the  normal  curve 

Further  v/hen  free  mating  hetv/een  the  original  races  t.  ^ 
themselves  and  'ofl  the  hybrid  t^akes  place  the  normal  surface 
results  if  the  quality  depends  on  more  than  one  element.  To 
show  how  rapidly  this  takes  place  a series  of  diagrams  of  a 
stable  mixture  of  two  races  of  different  mean  quality,  when 
the  quality  depends  on  one,  two  and  three  elements  respect- 
ively, is  given  for  one  of  the  simplest  cases. 

Diagram  here. 

This  derivation  of  the  normal  curve  is  expressible 
perhaps  more  easily  dn  terms  of  Mendelism  when  the  average 
quality  in  the  offspting  is  assumed  to  be  equal  to  the 
average  quality  of  their  parents.  If  two  races  with  a 
quality  depending  on  two  elements  mix,  we  may  denote  the  two 
pure  races  by 


A A 

!a  a 

B B 

jb  b 

These  will  form  a stable  race  with  the  proportions 


1 


A A 
B B 


A 

a 

4 

A 

a 

2 

A 

a 

B 

B 

B 

b 

b 

,a 

a 

1 

1 

A 

A 

1 

! 

B 

a 

a 

a 

B 

b 

|A 

A 

1 

B 

b 

1 

b 

1 

1 


a a 
b b 


2 
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Hence  "by  our  assumption  that  A a is  of  mean  stature 
between  A A and  a a etc.,  the  resulting  proportions  are:- 

1 4 6 4 1 


or  (l  ^1)^  which  is  the  expression  which  most  easily  gives 
rise  to  the  normal  curve  of  error.  The  grouping  with  any 
number  of  pairs  of  elements  can  easily  be  deduced  from  the  ^ 
above.  Even  when  coupling^ a marked  feature  this  distribution 
may  be  fairly  maintained  if  the  number  of  elements  deter- 
mining the  quality  be  large. 

Lastly  the  normal  curve  may  arise  in  time  through  the 
fact  that  some  quality  varies  in  time  according  to  the  inverse 
exponential.  This  Iw  which  holds  v/ith  regard  to  a large  numbe 
number  of  processes  in  physico-biological  chemistry.  Thus  I 
find  diseases  which  attack  about  the  mean  age  of  life  have 
often  an  approximately  normal  distribution  (cf.  seitd,  IX)  i 

and  many  epidemics  seem  to  run  a similar  course,  cf,  (5)  In 
this  case  as  I have  before  shown  (l^)  but  repeat  here  for  i 
the  sakeof  completeness,  if  £ be  the  value  of  the  infectivity 
at  the  beginning  of  the_period  and  £ the  fraction  determining  | 


the  rate  per  unit  time  at  which  the  infectivity  is  lost  the 
resulting  curve  i44 ill  be  of  the  form  ^2  j)  ^ 


which  as  £ is  less  than  unity  is  the  normal  curve  of  error. 


B,  Frequencies  derivable  from 

The  different  methods  by  vs'hich  asymmetrical  distribution 
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and  also  syimnetrical  tut  not  normal  distritutions  may 
arise  will  now  be  considered  seriatim,  by  the  selection  pf 
a number  of  typical  cases  in  which  the  causes  of  the 
departure  from  the  form  (p  + q)°^  can  be  easily  seen.  The 
The  asymmetrical  as  much  the  most  important  will  be  discussed 
in  the  first  place. 
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(a)  Asymmetrical  or  Skew  Distributions. 

1.  Skew  distributions  arise  simply  from  the  method  of 

measurement.  Tv/o  methods  of  measurement  may  often  be  equally 

probable  a priori;  for  example,  notes  in  music  may  be  measured 

either  by  their  position  on  the  ecale  or  by  the  number  of 

their  vibrations.  It  is  obvious  that  if  the  distribution  be 

under  , 

symmetrical/either  of  these  conditions  it  cannot  be  symmetrical! 

under  the.  other.  , 

II.  They  arise  when  areas  or  masses  are  taken  as  units  of 
measurement  if  the  variation  of  linear  dimensions  is  symmet- 
rical • 

III.  They  arieewhen  ratios  are  used  in  place  of  direct 
measurements,  if  the  direct  measurements  are  symmetrically 
difeributedi  The  typical  variation  of  ratios  is  skew,  marked 
degrees  of  skewness  being  readily  obtainable.  Symmetry  is  | 

I 

really  accidental.  i 

IV.  They  arise  again  when  the  quality  which  is  being 

!■ 

measured  has  some  inverse  ratio  to  the  quality  which  actually  |! 
varies.  ' 

■I 

V.  They  arise  when  the  variation  of  the  quantity 
measured  is  due  indirectly  to  symmetrical  variation.  This  ' 

t 

is  very  common.  | 

VI.  They  arise  when  unequal  numbers  of  races  mix  freely  | 

!'; 

ii 
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though  the  original  race  may  have  varied  in  a symmetrical 
manner • 

VII • They  arise  from  the  fact  that  many  qualities  in 
life  vary  so  as  to  he  easily  graduated  hy  the  terms  of  a 
geometrical  progression* 

VIII*  They  arise  when  frequency  distrihittions  have  time 

as  the  independent  variable* 

IX*  They  arise  in  such  distributions  as  represent  the 

frequency  of  disease  at  different  ages* 

1*  As  an  example  of  the  dianges  made  by  the  choice  of  a 
scale  of  measurement,  the  number  of  notes  of  definite  pitch 
occurring  in  the  soprano  song^  of  Schubert  has  been  chosen* 

As  in  any  song  of  definite  hey,  certain  notes,  the  tonic, 
the  median]^-  etc*,  tend  to  occur  with  much  greater  frequency 
than  others;  one  soprano  song  in  each  of  the  twelve  keys 
has  been  taken  and  the  first  fifty  notes  of  that  song 
apportioned  to  their  proper  pitch*  In  this  case  we  have 
two  obvious  modes  of  measurement,  that  by  octaves,  which  is 
the  method  adopted  in  musical  notation,  and  that  by  vibration 
which  is  the  method ■ in  .science*  In  the  first  case,  the 
number  of  octaves  above  and  below  the  mean  pitch  is  theor- 
etically infinite  in  number,  though  but  a few  are  audible 
as  music*  In  the  second  case,  the  actual  number  of 
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vibrations  extends  from  zero  to  infinity.  It  is  obvious 
that  the  logarithm  of  the  independent  variable  of  the  latter 
scale  is  the  independent  variable  of  the  former.  The  re- 
lationship betv/een  the  frequencies  on  these  two  scales  is 
therefore  immediately  determined.  If  the  normal  curve  fits 
the  distribution  of  notes  as  measured  by  octaves,  the  dis- 
tribution measured  by  the  scale  of  vibrations  will  be  the 
Galton-kacalister  curve.  For  if  we  take  If  ^ ^ 

to  represent  the  distribution  on  the  first  supposition  with 
X as  the  independent  variable,  the  complete  change  is 
obtained  by  madcing  X equal  to  Log.  x.  As  the  amo'^t 
present  on  each  element  of  abscissa  in  the  first  case 

in  the  transformed  curve,  the  amount  rest- 
ing  on  the  new  element  is  "x  ^ 

This  is  the  Galton-Macali ster  curve  which  is  thus  seen  to  be 
a simple  case  of  “translation”  of  the  normal  curve.  IT  on 
the  other  hand  the  musician's  ear  really  estimates  the 
nuraber  of  vibrations,  only  using  octaves  for  convenience  of 
notation,  a normal  curve  might  be  reasonable  expected  to 
represent  the  frequency  on  the  scale  of  vibrations.  The 
curve  of  frequency  on  the  scale  of  octaves  will  be  obtained 

for  X 


by  substituting  6 
It  is  obvioubly 


'IK 


Z 
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The  example  chosen  suffers  under  various  defects.  It  is 
exceedingly  difficult  to  get  twelve  characteristic  songs 
upon  twelve  different  keys.  This  was  fo^nd  especially  with 
regard  to  the  songs  on  the  keys  of  C sharp  and  G flat. 

These  songs  have  an  undue  number  of  repetitions  of  the  same 
notes  and  when  the  notes  are  grouped  for  calculation  it  is 
found  that  out  of  one  hundred  notes  taken  from  these  keys, 
thirty-nine  fallin  one  group  at  a low  point  of  the  scale. 

As,  however,  it  seems  impossible  to  select  sufficient  songs 
on  different  keys  frem  any  other  composer  I have  not  tried 
to  repeat  the  experiment.  The  frequencies  have  been  fitted 
to  the  normal  curve  on  both  hypotheses.  The  observed 
values  and  the  values  obtained  on  the  two  different  hypoth- 
eses are  tabulated  in  parallel  columns  w^ith  the  values 

^ added  for  comparison. 

In  neither  case  is  the  fit  a good  one,  but  in  the  case  where 
the  vibrations  are  taken  as  the  independent  variable  there 
is  very  little  divergence  between  the  facts  and  theory, 
except  at  the  point  already  referred  to,  which  accounts  for 
tv/o-thirds  of  the  value  of  ^ . Where  the  octaves  are 

the  independent  variable  the  fit  is  not  nearly  so  good  on. 
the  whole. 

As  far  as  this  one  example  goes  it  may  be  taken  to 
show  that  the  musician* s ear  is  attuned  more  to  vibrations 
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than  to  octaves.  Exactly  the  same  kind  of  remark  must 
apply  to  such  prohlemsras  guessing  at  tints.  We  do  not 
knov/  a priori  what  mixture  of  influences  are  at  play.  The 
hiological  question  is  to  find  the  scale  which  most  nearly 
measures  our  psychical  processes. 
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Table  I showing  the  concordance  of  fact  and  theory 
when, .the  notes  of  Schubert’s  songs  are  fitted  to  the  Normal 
Curve  when  the  pitch  is  measured  by  ocUies  and  by  vibrations. 


Notes . 

Ac  tual 
Numbers. 

Theoretical  Numbers 
Scale  of  Octaves. 

Theoretical  Number! 
Scale  of  Vibration! 

A.  Bt> 

1 

3.74 

.54 

G.  Ah 

7 

12.60 

6.10 

S.  Ff 

28 

32^32  . 

31.36 

e. 

90 

64.58 

79.53 

CfD. 

105 

102.51 

119.81 

B.  C. 

127 

120.66 

128.31 

A.  bI) 

96 

113.97 

i 

98.89 

G.  aJ 

65 

79.83 

64.28 

F.  F# 

56 

44.86 

36.90 

Dlf  E. 

19 

18.69 

19.11 

Ct  D. 

7 

6.27 

9.04 

B.  C, 

2 

i.57 

4.5 

r 

24.03 

19.38 

T 

.013 

.131 

4 

.075 

.062 

2.920 

2.746 
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II  and  III*  Frequencies  of  Products  and  Indices. 


Asymmetrical  curves  arise  when  ratios  or  products  are 
chosen  as  independent  variables*  The  area  of  a leaf  or  the 
ratio  of  two  lengths  v»rill  not  be  normal  in  distribution,  if 
the  linear  measurements  on  which  they  are  based  are  normal* 
For  an  examination  of  the  principles  it  will  be  clearer  to 
begin  with  a first  approximation  by  taking  the  frequency 
distribution  of  the  primary  qualities,  as  that  of  Type  III* 


The  curve  approaches  very  closely  in  form  to  the  normal  curve 
when  n is  large  and  is  chosen  on  account  of  the  ease  with 
which  it  can  be  dealt  with  mathema.t ically . The  error  of  the 
results  is  small  and  easily  allowed  for.  Let  the  measure- 


For  i^roducts  and  ratios  new  independent  variables  ra  and  £ 


A*  The  first  case  considered  is  that  of  the  frequency 
distribution  of  m where  and  where  f and  ^ 

are  assumed  quite  uncorrelated*  The  surface  which  represents 


raents  (P  , f have  frequencies 


-Jf  -vf 

e 


- 19 


The  integral  desired  is  obviously  the  volume  resting  on 
the  strip  between  ^(T  = m and  $(T  s m-^d  m contained 

betv/een  the  plane  of  ^ and  CT  and  the  surface  given 
above#  This  frequency  is  most  easily  obtained  by  summing 
all  values  up  to  the  curve  =r  7^  and  then  taking  the 

differential  of  that  sum.  The  expression  for  this  is  easily 
seen  to  be 

hif  t £ 

6 


ll 


from  which  volume  resting  on  strip  of  area  required  namely 
is  found. 

a.  h*- 


i.e. 


a>^ 


dM  S e ' 


so  that  the  frequency  of  each  value  of  m is  given  by 

f n. 


an  integral  which  is  a solution  of  Bessel* s equation.  To 
illustrate  the  variation  on  this  curve  we  may  take 
a Hi  Y/  " 7i- 


Denoting  the  frequency  of  each  value  of  m by  ^ 

^ = XX  ^ ^ 


/ /lYi 

J"  fcU.  3 If  I { 


r 


so  that 
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and  likewise  for  the  integrals 

/ / Jo 

Whence  the  moments  round  the  origin  are 


etc  • 


giving 


h 

h 

h 


h 

r 


whence  ^2.  are  easily  obtained.  In  the  case  of 

volume  where  a third  variable  ~T  comes  in  and  ^ 


the  corresponding  curve  is 

>e  /•  <^ 


lift' 


B.  The  case  when  ^ and  (T'  are  completely  correlated 
is  more  simple.®  There  is  little  less  generality  if  we 
assume  them  always  directly  proportional  so  that 


s cf.  (ij  page  670,  note  V 


•*v 
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■becomes  the  limiting  equation.  Here  the  frequency  of 
5 is  the  same  as  that  of  S » namely,  % S ^ 

But  this  element  is  transferred  to  a different  distance 
on  the  axis  and  placed  on  a different  element  of  a'bscissa. 


The  relation  is  by  assumption 

Ifeis  ^ 


</L 


/ 


z 

so  that  the  curve  of  frequency  of  each  value  m is 

f - ~2  M < 


The  moments  are  easily  o"btained  and  are 


/<«  ^ 


tL 

r 

(h  ^ t )(k  jiiO 

f 


C.  The  case  of  the  ratio 


(f' 


can  be  solVed  in  the 


same  way  but  more  easily,  otherwise 


S. 


Let 


a cf.  (8)  & (9)  where  either  is  treated  quite 
differently. 
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Then  the  element  is 


0^  6^  f>^  V ^ 


T dtrc(4^ 


=-  7w  ~ '/cu^ 


If  he  constant  so  that  ^ - "h^d^  all 

that  is  necessary  is  to  integrate  from  0 to  c?o  with 
reference  to  2"  . Por  each  value  of  ^ we  have  the  fre- 
quency 


Xi-tH 


4cr 


y, X V T{h,4K^-^t) 

/ . M/V  Hi.-* 


y,  %'  nt  ‘ 


or  the  curve  in  Pearson’s  Type  VI.  Let  hf->^v. 

and  the  equation  hecomes 


The  moments  of  this  are  known.  The  degree  of  asymmetry 
introduced  by  tabulating  areas  or  indices  can  now  be  seen. 
The  values  of  A.  are  given  in  the  annexed  table  for 
values  of  n from  0 to  1000  for  the  curve  Type  III.  and  for 
the  three  instances  worked  out  above. 
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Tatle  showing  hov/  asyoimetry  arises  v/hen  areas  or 
indices  are  tabulated* 


Type  III 

• Area;  no 

Correlation 

Area:  perfect  Index:  no 

Correlation  Correlation,  j 

'k 

1 /i 

4 

0 

4 

9 25.037 

50.333 

43.808 

87.750 

• • • • 

• • • • '• 

1 

2 

6 9.248 

19.68 

18.507 

37.408 

• • • • 

• • • • 

2 

1.33 

5 5.474 

12.674 

11.268 

23.488 

• • • • 

• • • • 

5 

.67 

4 2.403 

7.394 

4.978 

11.766 

7.784 

20.485’| 

10 

.36 

3.5§1.231 

5.099 

2.526 

7.364 

2.493 

7.297 

20 

.19 

3.29  .621 

4.052 

1.262 

5.153 

1.047 

4.670  : 

50 

.08 

3.12  .249 

3.426 

.503 

3.850 

.381 

3.661 : 

100 

.04 

3.06  .125 

3.210 

.261 

3.423 

.185 

3.281  - 

1000 

.004 

3.00i.00125 

3.021 

.025 

3.042 

.018 

3.027 ’ 

I 
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The  general  theoreai  of  a ratio  when  the  fre- 
quencies of  the  two  variables  are  given  by  a normal  surface 
is  as  simple  as  that  given,  but  the  expressions  are  more 
comx:ilex.  The  ordinary  element  of  volume  of  a normal  correl- 
ation surface  is  given  by 


Changing  to  polar  co-ordinates  and  transferring  the  origin 


/ 


means  of  the  qualities  measured 


to  distances 


Jc-A.  ^ ^ ^ which  correspond  to  the  , 

qualities  measured,  we  have 


As  regards  the  integration,  consider  first 


'ye  dy 


which  equals 


e iv  4- 


:P 
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Now  since  a is  never  less  than  10  in  practice  this  last 
- AfW  , so  that  for  all  practical  purposes  we  may  take  t 
the  limits  of  the  first  integral  as  oo  and  cx9  • 

For  then,  ^ r -zh^Byie  may  integrate  "between  ct? 
and  ct>  with  regard  to  V so  that  for  ^ 

constant  the  amount  on  each  element, when  the  origin  is 
changed  as  a"bove,  is  a 


A k 

spJd  'W-  1. 

%ftujk,6 ^(>^'■6^ 


~{ui*<.  T-rr — - ^ ,TI — ^ ' 

- 25K(T/i 


which  is  seen  at  once  to  "be  a •' translation  form”  of  the 
normal  curve 

•X  - X 


7 * « 4 A ^ 


(aK 


<7^*"  -Zsrt<rih< 


since  the  factor  of  the  exponential  is  the  differential  of 
this  latter  expression  v/ith  respect  to  m. 

From  tnis  general  equation  some  light  can  he  thrown 
on  the  degree  of  skewness  of  ratio  curves.  Assuining 
A s I ^ Cf  and  ^ ^ 0 the  curve  becomes 
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The  larger  where  a is  the  mean  value  and  (T* 

the  standard  deviation  of  the  variables  the  more  symmetrical 
the  curve  becomes  and  in  addition  the  more  nearly  normal 
it  is.  To  compare  with  our  former  approximation  in  which 
Type  III.  vvas  used  S'Cu.ct,  pn  (L****-c>  ^ O) 

^ ^ ^ /jT^/ 

In  the  table  given  below  we  see  that  for  the  cephalic 
index  ^ ‘s  30,  whence^  ■ 1000,  and  for  the  sacral 

index  ~ - 10,  or  H = 100.  From  Table  II.  it  is  thus 

(r 

seen  that  the  distribution  curve  of  the  former  ratio  would 
in  the  case  first  locnsidered  be  practically  normal  and  the 
last  curve  have  a skewness  lying  between  the  values  given 
in  the  table  for  no  correlation  and  perfect  correlation, 
namely,  a .125  and  .261  add  ^ a 3.21  and  3.423 

respectively.  Skewness,  however,  may  be  much  increased  if 
^ h (T  ^ <J\  . The  actual  values  found  by 

observation  are  given  in  Table  IV. 


27 


TABLE  IV. 


Shov/ing  values  of  constants  regarding  the  dimensions 
of  the  human  skull  and  the  human  sacrum  in  the  female. 


Skulls  Length 

L 

189.06 

6.267 

30.16 

.001 

2.958 

II 

180.36 

6.218 

29,01 

.047 

3.109 

Breadth 

B 

140.67 

5.279 

26.64 

.026 

4.312 

II 

134.68 

4,773 

1^8.22 

,002 

2.683 

Height 

H 

132.04 

5.560 

23,75 

,092 

2.802 

II 

124.56 

4.933 

25.25 

.032 

3.282 

B 

Index 

r 

74.34 

6.520 

11.40 

.001 

3,473 

•1 

II 

74.73 

5.963 

12.53 

.004 

2.609 

H 

Index 

L 

69.97 

6.448 

10.85 

.006 

3.815 

II 

II 

69.13 

5,668 

12.19 

.167 

2.792 

Sacrura  Length 

10.00 

1,093 

9.15 

.0184 

3.189 

Br eqdth 

B 

r 

11.50 

.707 

16,26 

.0044 

2.835 

Index 

116.11 

13.679 

8.49 

,7251 

4.489 

• \ 


{ i 


1 


a 

j^rV^ 


V 


u 


< 


Jm 


' V 
' '1^ 


tk 


i 
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Note  on  Ratios  and  Inverse  Laws. 


The  distributions  of  ratios  are  themselves  so  inter- 
esting that  a few  remarks  on  the  subject  may  not  be  out  of 
place.  The  simplest  case  is  that  of  the  distribution  of 
the  ratios  which  the  two  parts  of  a straight  line  divided 
at  random  bear  to  one  another.  Here,  if  the  one  part  be 
denoted  by  x and  the  other  be  a - x the  ratio  is 
X -in  that 


a 


doc 


The  frequency  of  each  value  m of  this  ratio  v/hen  x is  the 
independent  variable  is  equal,  whence  the  frequency  of  each 

value  of  m when  m is  the  unit  of  abscissa  is^given  by 

a d m *”  a 

or  ^ ■ (3*^4 m } '*'  is  the  curve  of  fre- 

quency. This  result  in  itself  is  neither  new  or  specially 
interesting.  But  the  process  is  interesting  when  it  is 
noticed  that  the  above  equation  is  the  simplest  case  of 
pear son’s  Type  IV.,  and  corresponds  to  the  case  regarding 

"X. 

the  ratio  already  calculated  for  Type  III.  ^ 

v/hen  n - 0.  It  is  one  of  the  many  instances  of  curves 
which  may  be  called  "Symmetrical”  tatio-curves . The 


symmetry  is  at  once  apparent  is. the  independent  variable 


be  changed  to  where  m s instance,  the 

above  curve  takes  the  form  y s / 
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The  group  of  such  curves  is  large.  The  Galton- 
Macalister  curve  is  one;  Type  VI.  where  m ^ 2n  2 


is  another.  Any  curve  of  the  form 


obviously  fulfils  the  conditions  and  for  even  functions 


One  such  form  occurred  lately  in  connection  with 
Mendelian  coupling  and  is  noted  specially.  In  this  case 
the  the  number  of  instances  is  too  small  to  allow  of  much 
discussion.  The  development  will  be  found  in  a previous 
paper.  The  main  facts  are  that  from  a fourfold  division 


a,  b,  c,  d,  the  value  of  the  ratio  be  was  specially 
required.  This  ratio  had  the  following  frequency 

Value  of  ratio 

4-5  5-6  .6-7  7-8  -8-9  9-10  10-11  11-12  18-19  23-24 
No.  of  instances 

222555  4 1 1 1 

She  mean  of  these  M - 9.14  48.  The  median  lies  some- 

where nearer  9 than  8.  When  81  is  taken  as  the  square 
of  the  constant  of  inversion,  and  the  curve  inverted,  the 


The  curve  just  discussed  is 


a d 


1 
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nev/  mean  has  an  abscissa  of  8*96,  the  median  lying 
slightly  on  the  opposite  side  of  9;  so  as  far  than  as 
the  figures  go,  it  may  be  taken  as  a symmetrical  ratio  curve, 
a result  which  might  be  expected.  When  the  distribution 
is  fitted  to  one  of  ^e^tson*s  curves  the  constants  give 
Type  I.,  but  the  particular  variety  which  starts  with  an 
infinite  ordinate;  in  this  case  at  a distance  of  1.7  units 
from  zero.  In  view  of  the  previous  points  it  has  therefore 
no  biological  significance.  This  example  was  hardly  worth 
much  additional  statistical  labour,  but  as  it  was  an 
actual  cogiplex  ratio  it  seemed  better  to  verify  whether 
the  curve  might  not  be  a solution  of 

A trial  was  made  to  obtain  a better  fit  by  this  means,  but 
the  type  of  curve  fround  was  identical. 
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IV.  Asymmetry  may  arise  when  the  inverse  of  the 
quantity  is  measured  rather  than  its  direct  value.  For 
instance,  if  in  cases  of  myxoedema  the  weights  of  the 
pa,tients  were  measured,  a distribution  would  be  obtained 
probably  more  or  less  in  inverse  relationship  to  the 
amount  of  active  secretion  of  the  thyroid  gland.  The 
cases  of  asymmetry,  however,  due  to  inversion  which  are  of 
most  inportance  occur  in  indices.  It  is  either  a matter 
of  chance  or  of  accidental  convenience  whether  the  index 
or  its  inverse  is  chosen  and  the  degree  of  skewness  will 
almost  certainly  be  different  in  the  two  cawes.  The 
mathematics  of  inverse  curves  is  very  simple.  Take,  for 
instance.  Type  which  is  found  to  represent  approximately 
the  inde^  curve  to  distributions  which  are  normal  in 
character.  The  element  of  area  corresponding  to  a definite 
abscissa  is  in  this  case 


X.‘  the  element  of  the  new  curve  corresponding 

to  the  abscissa  is 


X.  X. 


/ 


These  are  equivalent  if  m s 2n  + 2,  a form  which  may  be 
termed  a symmetrical  index  curve.  If  K ^ then 

the  index  curves  will  have  different  degrees  of  skewness. 
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For  instance,  if  jn  r 20  and  n ■ 8,  the  curve  has  a 

skewness  of  #5352  and  int  inverse  one  of  *5490,  while 
that  of  the  syinmetr ical  form  when  m « 20  n « 9 is 
• 6410*  The  skev/ness  of  the  index  curve  can  thus  be  taken 
as  having  little  or  no  biological  significance.  It  is 
interesting  to  note  in  this  connection  that  Type  V.  is  the 
inverse  of  Type  III.  For  Type  III.  the  element  of  area  is 
% which  becomes  ^ 

Type  V.  is  a rare  curve  in  biology  and  it  is  quite  possible 
that  on  some  of  the  occasions  in  which  it  appears  it  is 
due  to  the  quantity  measured  being  dependent  upon  defect 
of  a quality  rather  than  on  its  presence.  As  Type  V.  is 
deduced  from  Type  III.,  so  an  inverse  curve  can  be  obtained 
from  the  normal  curve.  In  this  case,  if  the  equation  of 
the  latter  be  taken 


the  origin  being  the  point  of  inversion,  we  obtain  the 
equation  of  the  inverse  as 


It  is  noted  that  that  portion  of  the  normal  curve  corres- 
ponding to  a negative  abscissa  has  no  biological  significance 
Time  symmetry  is  also  probably  of  inverse  origin,  but 
from  a reason  which  will  be  considered  under  the  appropiate 
heading. 
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V.  In  "biology  a great  variety  of  shapes  of  curves 
may  occur  directly  as  the  result  of  normal  variation.  This  ca 
can  he  perhaps  most  easily  illustrated  hy  discussing  a 
specific  instance.  Such  a structure  as  the  human  pelvis 
affords  a good  example  for  an  approximate  solution  of  some 
of  these  problems.  The  inlet  of  the  pelvis  is  practically 
a circular  ox>ening  hounded  hy  a chord  corresponding  to  the 
sacrum  which  occupies  a large  part  of  the  circumference. 

From  ohservation  the  breadth  of  the  sacrum  varies  very 
closely  according  to  the  normal  curve  of  error. 

For  tlie  purposes  of  a first  approximation  the  pelvis 
may  he  imagined  as  circular  in  all  its  parts,  excepting  the 
sacral  chord  denoted  hy  CD  in  the  diagram.  Let  the  middle 
point  of  this  chord  he  denoted  hy  B,  and  let  a diameter  he 
drawn  at  right  angles  through  this  point.  Let  this  meet 
the  circle  again  in  A;  for  the  approximation  the  arcs  A C 
and  A D may  he  imagined  as  rigid  and  hinged  together  at  A.. 
This  gives  a rough  mechanism  hy  v/hich  the  noimal  variation 
of  the  chord  C D may  he  examined.  As  a further  construction 
a second  diameter  is  drawn  through  the  center  0 at  right 
angles  to  the  first.  This  meets  the  circle  in  F and  H,  F H 
is  thus  the  transverse  diameter  of  the  pelvis  and  A B the 

anteroi)oster ior  diameter.  If  now  the  chord  C D vary 
according  to  the  normal  curve  of  error,  there  will  he 
corresponding  variations  in  the  anteroposterior  diameter 
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A E and  in  the  transverse  diameter  F H.  (Prom  the  assvuned 
condition  tha  transverse  diameter  v/ill  not  really  he  P H 
hut  the  distane  between  the  points  at  which  the  tangents 
patellel  to  A B touch  the  curve)*  Novit  if  A C and  A D are 
joined  these  are  obviously  constant  lengths  which  may  he 
denoted  by  £.  We  may  denote  A B by  h and  its  variation  by 
X.  In  the  same  way  we  may  denote  C B by  a and  its  variation 
by  We  have  then,  since  A B C is  a right  angled  triangle 

which  reduces  to 

0 - -tTA.y' 

b being  negative  when  a is  positive  and  vice  versa.  This 
gives 

The  positive  sign  of  the  root  being  taken  since  when 
X-B- a This  gives  the  value  of  x in  terms  of 

1^*  Wnen  the  equation  of  the  frequency  of  x is  given  with 
X as  the  independent  variable 


It  is  obvious  at  once  that  if  the  variation  of  r is  normal 
that  the  variation  of  x ia  skew,  the  skewness  increasing 
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as  the  standard  deviation  of  r increases.  A complete 
solution  of  the  variations  of  the  transverse  diameter  will 
also  be  givne,  but  if  we  consider  thit  the  vertical  distance 
from  the  diameter  A B of  every  given  x^oint  of  the  chord  A C 
varies  normally,  it  will  be  seen  that  the  variation  of  the 
transverse  diameter  must  be  symmetrical  and  not  deviate  so 
much  from  the  normal  curve  of  the  anteroposterior,  this  is 
in  fact  v/hat  is  found.  The  actual  figures  give  the  constants 
of  the  variation  of  the  pelvis  as  follows: - 


Sacral  Breadth 

.00445 

2.8351 

Transverse  Diameter 

.05395 

3.0386 

Anteroi)Osterior  Diameter 

.1452 

3.5988 

One  further  development  can  be  made  to  show  how  a one-sided 
frequency  descending  from  an  infinite  quantity  zero  is 
dertimined  by  the  normal  variation  of  the  chord  CD.  If  the 
tangential  transverse  diameter  cuts  A B at  the  point  of  P 
then  the  frequency  of  eqch  value  A P varies  with  this 
extreme  shewness.  This  investigation  is  directly  allied  " » 
with  that  of  the  true  variations  of  the  transverse  diameter 
according  to  our  mechanical  hypothesis. 

The  true  transverse  diameter  is  the  distance  between 
two  tangents  parellel  to  the  diameter  A B.  As  A is  fixed 
each  arc  AC  or  A I)  will  rotate  according  to  the  hypothesis 
an  equal  angle  in  opposite  directions  which  we  may  term  . 
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It  is  obvious  then  that  whether  the  variation  of  the  chord 
C D is  positive  or  negative  the  centre  of  the  centre 'of  the 
new  circle. will  lie  on; the  side  of  0 opposite  to  C L since 
A is  fixed  and  A 0 is  constant*  If  A 0 ■ a the  new  centre 
will  be  at  the  point  Jt  = if  A is  taken  as  the 

origin  of  the  co-ordinates.  The  equation  of  this  circle 
is  thus  obviously 


For  pusj)oses  of  calculation  C D may  be  supposedi  to  subtend 
an  angle  of  60°  at  the  circumference  of  the  circle.  As  a 
is  the  radius  of  the  circle 


0 B C being  a right  angled  triangle  with  the  angle  C 0 B - 
60°.  A C is  thus  equal  to  fh  a.  and  A B to  /-J'^  . The 

types  of  variation  can  now  be  evaluated.  If  P is  the  point 
on  the  axis  v/here  the  line  joining  the  tangents  parallel 
to  A B cuts  the  abscissa  A P is  obviously  equal  to  ^4^ 

The  Angle  ^ is  the  angle  between 
the  mean  position  of  A C and  its  variation  x 


AC 


— <5*v 


flC 


-I  (>-i^ 


n 

c 
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For  the  variation  of  the  transverse  diameter  if  ^ be 
the  actual  variation 


SO  that 


1 


but 


\J  L / 

similarly 

ci  f 


so  that  j 

X.  4- 


If  we  take  as  found  by  observation  the  frequency  of  the 
variation  of  x to  be  normal  (/\ 
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we  have  the  frequency  of  variation  for  the  new  curvee 
given  byj  (a)  for  the  transverse  diameter  by  the  ordinary 

’’translation  formula” 


and  for  A P 

- 

By  observation 

^'-fcCD 

and  by  hypo the si 

s ~ ^ ^ ^ 

The  first  curve 

is  nearly  a normal  curve  tha  second  has 

a form  given  in 

the  subjoined 

table 

a: 

V 

• 

1.000 

.0338 

.999 

19.433 

.0938 

.998 

10.819 

.1186 

.997 

6.394 

. 1455 

.995 

8.755 

.8089 

.990 

.410 

.8458 

.985 

.111  • 

.8818 

.980 

.087 
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It  is  not  pretended  that  the  above  description  holds 
rigidly,  it  is  a mere  mathematical  first  approximation 
resulting  in  a series  of  translated  normal  curves  of 
which  the  origin  can  be  at  once  shown.  It  might  be 
further  extended  if  ifinvas  worth  while  to  include  the 
fact  that  the  variation  in  the  circumference  of  the 
iliac  bones  also  obeyed  the  norm.al  law  but  the  discussion 
given  is  quite  sufficient  for  the  present  purpose. 
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VI . Under  the  category  of  asymmetry  due  to  the  mixture  of 
races  I have  not  been  able  to  find  any  figures  which  will 
repajT’  the  trouble  of  analysis.  Adequate  statistics  of  the 
mixture  of  two  races  which  are  symmetrically  disposed  wit:;i 
regard  to  some  quality  in  the  offspring  is  the  mean.'  dfuthe 
parents,  do  not  seem  to  exist  at  present.  Experim.ents 
will  require  to  be  made. 

Such  a case  would  however  come  under  the  ordinary 
formula  for  a stable  race  namely 

2mn(A,a)  n (a, a) 

The  solution  is  nearly  identical  with  tha.t  Prof.  Pearson 
gives  in  his  first  memoir  on  the  raathem^atics  of  evolution. 
There  Prof.  Pearson  analyses  an  as^/mmetrical  distribution 
into  the  sum  of  two  S3nnmetrical  distributions  each  of  vsliich 
obeys  the  normal  law.  His  chief  exam.ple  liowever  refers  to  a 
ratio  distribution  which  as  we  have  seen  is  essentially 
assymm*etric  S,nd  therefore  not  a suitable  case  for  the 
application  of  this  method.  Further  I think  that  it  is, 
exceedingly  unlikely  that  two  races  of  the  same  species  can 
exist  commingled  in  nature  witijout  breeding  together.  • The 
distribution  will  not  thus  be  expressed  by  1m. f- 


This  is  practically 
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an  applioa,tion  of  distributed  nultiplication.  If  there 
fore  be  the  ironents  of  j^(V 4 fi(2*‘h.ndi 

4% 


the  noments  of  the  distributed  squa.re  of  this,  wo  have 

f>.-i  • T.S'x 

f*-}  ’ 2^3  T. 

/<v  = 

fii-  ^ 

From  the  actual  observation5  etc.,  are  determined 

and  thence  ?,  .n  etc..  The  problem  is  thus  reduced 
to  that  solved  by  Prof.  Pearson. 


VII.  Apparently?  as  a resnlt  of  the  process  of 

physical  chenistry  the  geometrical  progression  describes 


many  phenomina  of  life.  The  well  knovm  example  given  by 
de  Vries  of  the  number  of  flowers  of  Ranunculus  Eulbosus 


witj  each  number'  of  petals  will  illustrate  this. 


It  is 


an  almost  perfect  example  of  a geometrical  progression. 

-yx 

Prof.  Pearson  has  fitted  it  to  the  curve  ^ ^ ^ 

but  the  latter  formula  does  not  give  nearly  so  close  a fit. 
As  the  geometrical  progression  is  itself  a special  case  of 
Type  III.  this  example  shows  incidently  the  difficulty  of 
applying  the  method  of  moments  to  cases  discrete  variates 
when  the  curve  terminatesabruptly . A table  of  the  num.bers 
theoretical  and  actual  with  thecorresponding  values  of 
and  T is  annexed.  It  is  seen  thiat  the  geometrical 
progression  has  a probability  P •=  .95  whereas  Typelll. 

has  a probability  only  .67. 

Firther  examples  of  the  geometrical  progression 
are  seen  in  the  decline  of  the  death-rate  with  each  year 
of  life  with  children  suffering  from  measles,  scarlet 
fever,  etc.  This  subject  is  only  mentioned  at  present. 
Along  with  the  subject  matter  of  the  next  two  sections  it 
will  be  dealt  with  separately  at  a later  date. 
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Table  I showing  number  of  Petals  in  flowers 
of  Ranunculus  Bulbosus  | (de  Vries). 


o 

• 

o 

"rr- 

f-H 

f Petals. 

Prof.  Pears 

on’s  Fitted 

to 

fitting . 

Geometrical 

Prbgerssion 

- 

5. 

133 

136.9 

n^‘i  ^^^*il34.37 

/33'5' 

6 . 

55 

48.0 

^'/'3  ^^‘■7  53.32 

7. 

23 

22.6 

lO'^Ul  21.16 

Ul> 

8. 

7 

^.5 

CD 

. 

tr 

9. 

2 

3.4 

J*  S 3.33 

iii 

10. 

2 

♦ o\ 

1‘i 

11. 

0 

7 - 

.pj 

.5-) 

1.84 

tIS'l 

Total 

222 

222 . 0 

222.42 

? 


3.27 

.67 


-fi? 


1.09 

.95 


1. 

.)4 

■o\ 

H t 

/■2l 
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VIII.  In  this  connection  that  asymnetry  v/hich 

arises  when  tiF.e  is  tha  independent  variable  in  the  curve 
of  distribution  is  considered.  ' v/hen  observations  are  made 
in  such  a matter  as  the  date  of  flowering  of  a lar£:e  number 
of  plants,  it  is  uniformly  found  that  a skew  distribution 
results.  This  is  what  would  be  expected  a nriori  from 
such  limited  knowledge  of  the  physiology  of  the  plants 
which  we  at  persent  possess. 

Flowering  is  the  result  of  a completion  of  a 
process  to  which  a certain  summation  of  changes  is 
necessary.  For  a first  approximation  this  sum  may  be  taken 
as  expressed  by  the  simnle;  if  1/  is  the  rate  at  which  the 

' yt 

change  takes  place  per  unit  of  time  4.^  vrillrepresent 
the  am.ount  of  substances  necessary  to  determine  flov^rering 
present  at  any  specified  tim.e.  If  then  a limit  K must 
be  reached  before  flowering  can  proceed,  we  have  the 
distribution  really  the  inverse  relationship  between 

- K (yv  yt  - 


If  y vary  normally  or  according  to  Type,  III. 
we  have  the  frequency  of  flowering  on  each  date  given 


respectively  by 


7= 


7^  ^ <r 


7 = 


- 
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A large  number  o’f  observations  will  be  necessary 
to  settle  the  exact  form  of  the  rate  of  increase,  but 
whatever  law  is  found  the  resulting  distribution  can  hardly 
be  anything  but  asymmetrical. 

(b)  Sy/mmetric  forms  derivable  from  (prq) 

I,  Symmetric  forms  are  much  less  interesting 
than  asy/mmetric  but  two  forms  arise  which  are  of  some 
importance.  An  example  of  one  of  these  is  given  by  Mr.  Yule 

on  page  IG^^  of  his  "Theory  of  Statistics"  (10),  In  this 

example  the  proportions  of  male  to  female  births  are 
tabulated  for  the  registration  districts  of  England  both 
as  regards  the  ratio  and  the  size  of  the  district.  As  is 
naturally  to  be  expected  in  the  smaller  districts,  the  range 
of  variation  is  very  considerable,  while  in  districts  such 
as  those  containing  three  hundred  thousand  inhabitants 
ans  upwards,  the  range  is  very  sm.all.  Here  the  theory 

of  chance  permits  of  ea,sy  evaluation  of  the  resulting  form 
of  curve.  The  chance  of  the  birth  of  a boy  or  a 

girl  (1.04  to  l)  being  practically  equal  the  asymmetry 
which  exists  may  be  neglected.  The  curve- formed  for  each 
registration  district  of  definite  size  will  be  essentially 
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normal  and  the  numerical  standard  deviation  will  be  ^ 
where  n is  the  number  of  births  in  the  district.  The 

rr  / -1^ 

standard  deviation  of  the  ratio  will  then  be  ^ j'K  2^^  * 

All  that  is  therefore  necessary  is  to  know  the  distribution 
of  the  registration  districts  in  size.  This  is  found  to 
conform  closely  to  a curve  of  Typo,  III.,  vdiere  the  pov/er 
of  n is  negative  but  greater  than  -/  i.e.  the  curve  starts 
from  infinity  at  the  distance  say  _c  from  zero.  V'e  hai'-e 
thus  since  for  each  size  of  district  the  distribution  is 
given  by  ^ ^ when  a is  a constant  and  since  the  distri- 

-button  of  n is  given  by  ^ ^ the  final  distribution 

given  by  the  equation 


7/^/ 


This  is  a curve  of  considerable  complexity  which  does 
not  lend  itself  to  an  easy  formula  for  calculating  the 
moments.  It  is  allied  to  Type  IV.  and  it  is  interesting 
to  note  that  Type  IV  proper  arises  if  the  distribution  in 
place,  of  being  that  Form  of  Type  III.  found,  is  the  ordinary 
form  when  the  power  of  n is  positive.  The  integral  then 
becomes  a simple  Gamma  function  and  gives  the  symmetrical 


•riyTr  ' V • • Vi  ,.W' 
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form  of  Type  IV.  or 

jljjii.’  , 

To  compare  theory  with  facts  the  distribution  as  given  by 
I,Ir.  Yule  has  values  of  the  essential  constants  .0167 

i«7.0126  which  gives  a curve  of  Type,  IV.  The  asymmetry 

being  slight  lias  been  neglected  in  fitting.  On  this 

hypothesis  the  actual  and  theoretical  figures  of  the  distri- 
-bution  are  given  in  the  annexed  table. 


Humber  of  Registration  Districts  with  different 

proportions  of  Male  and  Female  Births. 


465-70 

471-76 

477-82 

483-83 

489-94 

495-500 

Actual 

Ho. 

1 

1 

o 

4 

10 

33 

Theoretical 

.57 

.97 

"o.Ol 

'10.74 

40 . 29 

513-18 

519-24 

525-30 

531-36 

537-42 

543- 

135 

37 

5 

4 

1 

1 

140^61 

40^29 

m 

10,74 

3.01 

.97 

.87 

Tliis  gives  5.92  or  P=.92  • 
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This  form  is  thus  very  close  to  that  which  might  he 
reasonably  expected  from  the  foregoing  argument.  It  may 
he  objected  that  this  is  one  of  the  cases  where  the  result 
might  be  expected  a nriori  from  the  method  ofProf.  Pearson’s 
proof,  since  Type,  IV.  arises  when,  from- a mixture  of  an 
equal  finite  number  of  red  and  white  bells,  half  are  with- 
-drawn  time  after  time  and  the  distribution  of  the  results 
are  tabulated,  but  in  the  example  given,  the  number  of  balls 
of  the  analogy  is  clearly  infiriite,  and  the  normal  curve 
should  be  very  approximately  obeyed. 

II.  One  other  form  is  considered.  It  is  a 
continuation  of  a subject  discussed  in  a previous  paper. 

It  concerns  that  grouping  of  stature  which  results  if  in 
inheritance  coupling  either  internal  or  external  occurs,  or 
if  dominance  is  a factor  in  the  heredity.  It  has  been  shown 
before  that  if  the  element  determining  a quality  such  as 
stature  is  dominant  the  stable  race  for  one  quality  is 
(5  4)  as  is  well  known;  if  a second  element  take  part,  tlie 

stable  race  is  either  (f  J)  or  (£-,  f).  • The  latter 

is  of  the  form  (l  n 1)  and  if  inh.eritance  depend  on 
numerous  such  pairs,  we  have  (l  n l)  as  the  resulting 
form . 


Fow  this  form  is  easily  approxim.ated  to,  if  £ be 
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moderately  large.  The  writing  dovm  of  the  terms  of  a 
distributed  multiplication  may  be  done  as  follows,  a method, 
I think,  not  hitherto  nointed  out.  Take  for  example  (a  .b 


c)  and  write 

the  product  as  follows:- 

1 

1 t. 

r-  U 

o a : 

1 

3 X 

TO  a c 

1 

10  ^ 5 

5 a c 

1 

5b 

4 a‘^0 

^ t.  ■»- 

o a c 

^ J 

i.  £L  G 

1 

\ 

10b 

o 

CD 

to 

3 a c 

- 

10b 

a'*' 

Sac 

1 

5b 

a 

c 

b 

1 

■ 

1 

The  distributed  terms  are  then  obtained  by  adding 
together  the  terms  in  the  vertical  columns  each  multiplied 
by  the  corresponding  term.s  in  the  first  column  showing  the 
expansion  of  (b  1).  Thus  the  m.iddle  term  is:- 

6 ^ c )C5b  9 a c > lOb^-fl  > b^ 
or 

30  a^b  SO  a b^  c -f  b^ 

If  a=c  each  row  is  approximated  to  by  the  normal  curve  with 

2 

a standard  deviation  proportional  to  the  vertical  distance 
from,  the  point  where  the  distribution  is  represented  by 
unity  i.e.  where  —O  . The  form  of  resulting  distribu- 
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-tion  can  thus  he  found  to  be 


For  purposes 
have 


of  approximation  _6  may  be  taken  as 


A - 


^ i 

^ r 


zero,  and  v^e 


or  the  curve  partakes  of  the  moment  relationship  of  the 
symmetrical  Type  IV. 

This  same  form  also  arises  when  there  is  a 
coupling  as  here  again  distributive  multiplication  really 
comes  in.  The  ordinary  form  for?  a stable  population  is 
as  shown  before 


a 


A A 
B B 


2 ab 


A a 
B B 


b 


a a 
B E 


a b 


■ 2a  2b 


2 a b 


V,h.ere 


a.' 


A A 

b 

A A 

B b 

b b 

A a 

2 ab 

A a 

B b 

b b 

a a, 

a 

a a 

B b 

b b 

^ or  the  coupling  ratio. 
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If  we  consider  stature  deternined  by  the  same 
hypothesis  as  before  (p.  ) then  the  above  expression 

must  be  summed  diagonally,  so  that  we  have  the  distribution 
given  b3^ 


a 4 ^b 

instead  of  1 4 


1 

2 a -f  4 b 


4 a b 
4 


a 

1 


This  is  however  the  same  form  as  for  the  dominance  given 
before;  for  it  can  be  written 


a. 

a 


4 ab 
4 b'^ 


I 


?! 

1 1 
1 

4 ab  2a*’ -4  4b”*"  4 ab 


1 


•X. 

a 


A T3?-pe  IV,  curve  thus  results  in  general,  whether  heredity’’ 
is  dependent  on  coupling  or  dominance. 

Tbe  method  of  analysis  of  a complex  form  such 
as  of  stature  can  now  be  indicated.  It  requires  the  help 
of  the  theory  of  correlation.  If  stature  denends  on  a 

number  of  elements  inherited  independently  and  if  the  combi- 
-nation  of  two  of  these  determine  a mean  condition  in  the 
offspring,  the  the  co-efficient  of  correlation  between 
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parent  and  off spijingwill  be  given  by  'V' .5  and  the 
curve  of  stature  will  be  nornal . If  the  curve  approches 
Type  IV.  rather  than  the  normal,  this  is  explained  by  the 
fact  that  some  of  the  elements  are  dominant.  If  all  the 
elements  are  dominant  V ^ .3.  Taking  all  distributions 

of  stature,  it  is  found  that  though  nearly  normal,  there  is 
usually  a defect  of  fit  shown  by  the  normal  curve  at  the 
maximum.  This  defect  means  a deviation  in  the  direction 
of  Type  IV.  and  suggest  a certain  measure  of  domiinance. 

For  correlation  figures  we  find  for  instance,  that  the 
correlation  between  fatlier  and  son  is  Y .514,  between 
mother  and  son  .494,  and  father  and  mother  ^ -r  .280 

show~ing  that  the  real  correlation  between  father  and  son 
is  given  by 

Thus  the  same  kind  of  defect  from  the  theoretical  correlation 
•5  is  seen  as  has  been  observed  between  the  actual  and 
theoretical  distributions. 
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OONOLUSIONS. 


The  examples  given  in  the  previous  pages  vrill  have 
explained  sufficiently  the  point  of  view  from  v/hioh  this 


can  he  obtained  from  general  systems  of  curve  derivation. 

In  the  first  place  a very  small  alteration  in  the  value  of 
the  constants  ma^r  make  a very  great  difference  in  the  type 
of  the  theoretical  curve,  and  the  figures  at  our  disposal 
are  rarely  of  sufficient  worth  to  allovj  of  fine  distinctions 
being  drawn  between  curves  obtained  on  these  systems,  espe- 
- cially  when  in  addition,  the  differences  in  the  type  of 
the  curve  represent  really  nothing  in  the  facts.  But  there 
is  another  drawback.  Neither  Prof.  Edgeworth s nor 
Prof.  Pearson’s  methods  account  for  m.any  curves  which  arise. 
Thus  by  the  former,  the  independent  variable  x of  the  normal 
curve  is  replaced  by  A--}  and  the  latter  assumed  to  be 
quickly  convergent.  In  practise  this  is  not  so.  If  the 

original  curves  are  normal  vre  find  the  curve  of  a ratio 
given  in  the  si"  . • , 


the  fact  that  it  is  as  exact  ’’translation”  form,  tel.ls  us 


paper  has  been  written.  I do  not  think  ths.t  any  advantage 


'’an  immediate  ’’translation” 


form.  The  power  of  the  exponential  is  finite  for  all 
values  of  x but  it  is  not  convergent  in  a series.  Fiarther 
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nothing  hiologically j w©  an©  loft  st-nandod.  on  t-h©  sandB 
of  surmiB©”.  Derived  aa  shov/n  above  (p  ) the  fornu 
is  perfectly  intelligible.  The  same  results  appjy  to  the 
inverse  of  a normal  curve. 

Such  forme  are  also  not  comprehended  by  Prof. 
Pearson^ s formula.  That  just  given: - 
gives 


/ ^ 


3 k.  / 


/ ii.  f <-k) 


while  the  form  for  a product  ^ 


^ .-yr 


YJ^ 

y- 


is  a derivative  of  a Bessel  solution.  In  the  same  way 
others  of  thederived  curves  do  not  satisfy  his  equation  but 
more  probably  an  equation. 


- J ^ ^ - r 

^ ^ 4 yx*^  S ^ 

In  conclusion  it  seems  to  me  that  in  biology 
frequency  distributions  are  a purely  experimental  science. 
Mathematical  assistance  is  essential.  The  discoveries 
of  many  mathematicians  have  been  most  brilliant  and 
illuminating,  and  but  for  their  work  the  present  biological 
statistical  knov/ledg©  would  not  exist.  But  though  this 
work  has  been  brilliant,  it  is  too  much  divorced  from  life. 
The  real  function  of  mathematics  is  to  use  observations 
and  measurements  to  aid  in  finding  real  laws,  and  to 
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test  sucli  hypotheses  as  suggest  themselves.  Directly 
curves  are  thus  of  the  first  importance.  As  far  as  can  he 
judged  from  the  study  of  heredity,  many  of  the  elements 
are  inherited  independently  or  according  to  laws  which 
afford  easy  mathematical  expressions.  Properly  chosen 
experiments  can  easily  be  made  to  test  these.  But 
apart  from  this  there  is  a great  basis  of  physical 

I 

chemistry  underlying  life  with  laws  of  its  ovm,  which 
determine  the  subsequent  curves  of  distribution. 

Such  curves  can  not  be  comprehended  intelligently  eveh 
by  the  most  general  of  the  ” generalised”  curves. 
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